pycurl/curl 不遵循 CURLOPT_TIMEOUT 選項 (pycurl/curl not following the CURLOPT_TIMEOUT option)


問題描述

pycurl/curl 不遵循 CURLOPT_TIMEOUT 選項 (pycurl/curl not following the CURLOPT_TIMEOUT option)

I have a multi-threaded script which occasionally freezes when it connects to a server but the server doesn't send anything back. Netstat shows a connected tcp socket. This happens even if I have TIMEOUT set. The timeout works fine in an unthreaded script. Here's some sample code.

def xmlscraper(url):
  htmlpage = StringIO.StringIO()
  rheader = StringIO.StringIO()
  c = pycurl.Curl()
  c.setopt(pycurl.USERAGENT, "user agent string")
  c.setopt(pycurl.CONNECTTIMEOUT, 60)
  c.setopt(pycurl.TIMEOUT, 120)
  c.setopt(pycurl.FOLLOWLOCATION, 1)
  c.setopt(pycurl.WRITEFUNCTION, htmlpage.write)
  c.setopt(pycurl.HEADERFUNCTION, rheader.write)
  c.setopt(pycurl.HTTPHEADER, ['Expect:'])
  c.setopt(pycurl.NOSIGNAL, 1)
  c.setopt(pycurl.URL, url)
  c.setopt(pycurl.HTTPGET, 1)

pycurl.global_init(pycurl.GLOBAL_ALL)
for url in urllist:
    t = threading.Thread(target=xmlscraper, args=(url,))
    t.start()

Any help would be greatly appreciated! been trying to solve this for a few weeks now.

edit: The urllist has about 10 urls. It doesn't seem to matter how many there are. 

edit2: I just tested this code out below. I used a php script that sleeps for 100 seconds. 

import threading
import pycurl
def testf():
    c = pycurl.Curl()
    c.setopt(pycurl.CONNECTTIMEOUT, 3)
    c.setopt(pycurl.TIMEOUT, 6)
    c.setopt(pycurl.NOSIGNAL, 1)
    c.setopt(pycurl.URL, 'http://xxx.xxx.xxx.xxx/test.php')
    c.setopt(pycurl.HTTPGET, 1)
    c.perform()
t = threading.Thread(target=testf)
t.start()
t.join()

Pycurl in that code seems to timeout properly. So I guess it has something to do with the number of urls? GIL?

edit3: 

I think it might have to do with libcurl itself cause sometimes when I check the script libcurl is still connected to a server for hours on end. If pycurl was properly timing out then the socket would have been closed.


參考解法

方法 1:

I modified your 'edit2' code to spawn multiple threads and it works fine on my machine (Ubuntu 10.10 with Python 2.6.6)

import threading
import pycurl

def testf():
    c = pycurl.Curl()
    c.setopt(pycurl.CONNECTTIMEOUT, 3)
    c.setopt(pycurl.TIMEOUT, 3)
    c.setopt(pycurl.NOSIGNAL, 1)
    c.setopt(pycurl.URL, 'http://localhost/cgi-bin/foo.py')
    c.setopt(pycurl.HTTPGET, 1)
    c.perform()

for i in range(100):
    t = threading.Thread(target=testf)
    t.start()

I can spawn 100 threads and all timeout at 3 seconds (like I specified).

I wouldn't go blaming the GIL and thread contention yet :)

方法 2:

Python threads are hamstrung, in some situations, by the Global Interpreter Lock (the "GIL"). It may be that the threads you're starting aren't timing out because they're not actually being run often enough.

This related StackOverflow question might point you in the right direction:

(by IncognitoCorey GoldbergBrian Clapper)

參考文件

  1. pycurl/curl not following the CURLOPT_TIMEOUT option (CC BY-SA 3.0/4.0)

#pycurl #Python #timeout #multithreading






相關問題

python中的握手失敗(_ssl.c:590) (HandShake Failure in python(_ssl.c:590))

SmugMug 的變化似乎炸毀了 pysmug (changes at SmugMug appear to have blown up pysmug)

pycurl/curl 不遵循 CURLOPT_TIMEOUT 選項 (pycurl/curl not following the CURLOPT_TIMEOUT option)

需要幫助從 curl 遷移到 pycurl (need help with moving from curl to pycurl)

Tornado 的 AsyncHTTPClient 從 1.2 升級到 2.0 後不再工作 (Tornado's AsyncHTTPClient no longer works after upgrade to 2.0 from 1.2)

PyCurl 替代方案,libcurl 的 pythonic 包裝器? (PyCurl alternative, a pythonic wrapper for libcurl?)

使用 Pycurl 獲取 HTML (Getting HTML with Pycurl)

如果請求的數據有時被壓縮,有時不被壓縮,如何使用 pycurl? (how to use pycurl if requested data is sometimes gzipped, sometimes not?)

在 MacOS 上安裝 pycurl。(鏈接時 ssl 後端(無/其他)與編譯時 ssl 後端(openssl)不同) (Installing pycurl on MacOS. (link-time ssl backend (none/other) is different from compile-time ssl backend (openssl)))

Python 3.7:在 Windows 10 上安裝 pycurl (Python 3.7: pycurl installation on Windows 10)

Windows 機器在 Thonny 上安裝 pycurl 模塊 (Windows machine Installing pycurl module on Thonny)

當 python 線程在網絡調用(HTTPS)中並且發生上下文切換時會發生什麼? (What happens when the python thread is in network call(HTTPS) and the context switch happens?)







留言討論